MAP estimation of subspace transform for speaker recognition
نویسندگان
چکیده
We propose using the maximum-a-posteriori (MAP) estimation of subspace transform for speaker recognition. The linear transform is defined on the mean vectors of the Gaussian mixture model (GMM), where transform matrices and bias vectors are associated with separate regression classes so that both can be estimated with sufficient statistics given limited training data. The transform matrices are further defined as a linear combination of a set of basis transforms so that the weights are parameters to be estimated. We characterize the speakers with the transform parameters and model them using support vector machine (SVM). Experiments on the 2008 NIST SRE task illustrate the effectiveness of the method.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملFactor analysis subspace estimation for speaker verification with short utterances
Training the speaker and session subspaces is an integral problem in developing a joint factor analysis GMM speaker verification system. This work investigates and compares several alternative procedures for this task with a particular focus on training and testing with short utterances. Experiments show that better performance can be obtained when an independent rather than simultaneous optimi...
متن کاملAn investigation into subspace rapid speaker adaptation for verification
Rapid speaker adaptation is becoming more important in emerging applications where storage, computation and training utterances are at a premium (e.g. PDAs, cell phones). Effective adaptation can be achieved for the task of speaker verification, based on a maximum a posteriori (MAP) learning framework, by restricting the client’s parametric model to be a linear combination of parameters estimat...
متن کاملIncorporating MAP estimation and covariance transform for SVM based speaker recognition
In this paper, we apply Constrained Maximum a Posteriori Linear Regression (CMAPLR) transformation on Universal Background Model (UBM) when characterizing each speaker with a supervector. We incorporate the covariance transformation parameters into the supervector in addition to the mean transformation parameters. Maximum Likelihood Linear Regression (MLLR) covariance transformation is adopted....
متن کامل